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Abstract 

A useful approach to the mathematical analysis of large-scale biological networks is based 
upon their decompositions into monotone dynamical systems. This paper deals with two 
computational problems associated to finding decompositions which are optimal in an appro- 
priate sense. In graph-theoretic language, the problems can be recast in terms of maximal 
sign-consistent subgraphs. The theoretical results include polynomial-time approximation al- 
gorithms as well as constant-ratio inapproximability results. One of the algorithms, which has 
a worst-case guarantee of 87.9% from optimality, is based on the semidefinite programming 
relaxation approach of Goemans- Williamson [20]. The algorithm was implemented and tested 
on a Drosophila segmentation network and an Epidermal Growth Factor Receptor pathway 
model, and it was found to perform close to optimally. 
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1 Introduction 



In living cells, networks of proteins, RNA, DNA, metabolites, and other species process environ- 
mental signals, control internal events such as gene expression, and produce appropriate cellular 
responses. The field of systems (molecular) biology is largely concerned with the study of such 
networks, viewed as dynamical systems. One approach to their mathematical analysis relies upon 
viewing them as made up of subsystems whose behavior is simpler and easier to understand. Cou- 
pled with appropriate interconnection rules, the hope is that emergent properties of the complete 
system can be deduced from the understanding of these subsystems. Diagrammatically, we picture 
this as in Figure 1, which shows a full system as composed of four subsystems. 
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Figure 1: A system composed of four subsystems 



A particularly appealing class of candidates for "simpler behaved" subsystems are monotone 
systems, as in [23,24,42]. Monotone systems are a class of dynamical systems for which patholog- 
ical behavior ("chaos") is ruled out. Even though they may have arbitrarily large dimensionality, 
monotone systems behave in many ways like one- dimensional systems. For instance, in monotone 
systems, bounded trajectories generically converge to steady states, and there are no stable oscil- 
latory behaviors. More precisely, see below, one must extend the notion of monotone system so as 
to incorporate input and output channels, as introduced and initially developed in [5]; inputs and 
outputs are required so that interconnections like those shown in Figure 1 can be defined. 

Monotonicity is closely related, as explained later, to positive and feedback loops in systems. 
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The topic of analyzing the behaviors of such feedback loops is a long-standing one in biology in 
the context of regulation, metabolism, and development; a classical reference in that regard is the 
work [33] of Monod and Jacob in 1961. See also, for example, [3,6,11,28,32,39,40,44,47]. 

An interconnection of monotone subsystems, that is to say, an entire system made up of mono- 
tone components, may or may not be monotone: "positive feedback" (in a sense that can be made 
precise) preserves monotonicity, while "negative feedback" destroys it. Thus, oscillators such as cir- 
cadian rhythm generators require negative feedback loops in order for periodic orbits to arise, and 
hence are not themselves monotone systems, although they can be decomposed into monotone sub- 
systems (cf. [7]). A rich theory is beginning to arise, characterizing the behavior of non-monotone 
interconnections. For example, [5] shows how to preserve convergence to equilibria; see also the 
follow-up papers [4, 15, 18, 19, 26]. Even for monotone interconnections, the decomposition approach 
is very useful, as it permits locating and characterizing the stability of steady states based upon 
input/output behaviors of components, as described in [6]; see also the follow-up papers [3,17,27]. 

Moreover, a key point brought up in [45] is that new techniques for monotone systems in many 
situations allow one to characterize the behavior of an entire system, based upon the "qualitative" 
knowledge represented by general network topology and the inhibitory or activating character of 
interconnections, combined with only a relatively small amount of quantitative data. The latter 
data may consist of steady-state responses of components (dose-response curves and so forth), and 
there is no need to know the precise form of dynamics or parameters such as kinetic constants in 
order to obtain global stability conclusions. 

In Section 2 of this paper, we briefly discuss monotonicity of systems described by ordinary 
differential equations (the study of monotonicity can be extended to partial differential equations, 
delay-differential equations, and even more arbitrary dynamical systems, see e.g. [18] in the context 
of monotone systems with inputs and outputs). We explain there how the study of monotone 
systems, and more generally of decompositions into monotone systems, relates to a sign- consistency 
property for the graph which describes how each state variable influences each other variable in a 
given system. 

Generally, a graph, whose edges are labeled by "+" or "— " signs (sometimes one writes +1, — 1 
instead of +, — , or uses respectively activating "— or inhibiting "H" arrows as shown in Figure 2), is 
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Figure 2: A consistent and an inconsistent graph 



said to be sign- consistent if all paths between any two nodes have the same net sign, or equivalently, 
all closed loops have positive parity, i.e. an even number, possibly zero, of negative edges. (For 
technical reasons, one ignores the direction of arrows, looking only at undirected graphs; see more 
details in Section 2.) Thus, the first graph in Figure 2 is consistent, but the second one, which 
differs in just one edge from the first one, is not (two paths with different parity are shown). 

When applying decomposition theorems such as those described in [3-6, 15, 17-19, 26, 27, 45], it 
tends to be the case that the fewer the number of interconnections among components, the easier 
it is to obtain useful conclusions. One may view a decomposition into interconnections of mono- 
tone subsystems as the "pulling out" of "inconsistent" connections among monotone components, 
the original system being a "negative feedback" loop around an otherwise consistent system, as 
represented in Figure 3. In this interpretation, the number of interconnections among monotone 



components corresponds to the number of variables being fed-back. In addition, and independently 
from the theory developed in the above references, one might speculate that nature tends to favor 
systems that are decomposable into small monotone interconnections, since "negative" feedback 
loops, although required for homeostasis and for periodic behavior, have potentially destabilizing 
effects, especially if there are signal propagation delays. Some evidence is provided by work in 
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Figure 3: Pulling-out inconsistent connections 
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progress such as [31], where the authors compare certain biological networks and appropriately ran- 
domized versions of them and show that the original networks are closer to being consistent, and 
by [41], where the authors show that, in a Boolean setting, and using a mean-field calculation of 
sensitivity, networks of Boolean functions behave in a more and more "orderly" fashion the closer 
that the components are to being monotone. 

Thus, we are led to the subject of this paper, namely computing the smallest number of edges 
that have to be removed so that there remains a consistent graph. For example, for the particular 
graph shown in Figure 4 the answer is that one edge (the diagonal positive one) suffices, and it is 



worth remarking that no single other edge would suffice. 

In this paper, we will study the computational complexity of the question of how many edges 
must be removed in order to obtain consistency, and we provide a relaxation-based polynomial-time 
approximation algorithm guaranteed to solve the problem to about 87.9% of the optimum solution, 
which is based on the semidefinite programming relaxation approach of Goemans- Williamson [20] 
(A variant of the problem is discussed as well). We also observe that it is not possible to have 
a polynomial-time algorithm with performance too close to the optimal. While our emphasis is 
on theory, one of the algorithms was implemented, and we show results of its application to a 
Drosophila segmentation network and to an Epidermal Growth Factor Receptor pathway model. It 
turns out that, when applying the algorithm, often the solution is much closer to optimal than the 
worst-case guarantee of 87.9%, and indeed often gives an optimal solution. 

The organization of this paper is as follows. 

Section 2 briefly discusses monotonicity. The discussion is self-contained for the purposes of this 
paper, and references are given to the dynamical systems results that motivate the problem studied 
here. The connection to consistency is also explained there. Section 3 discusses the associated 




Figure 4: Dropping the diagonal edge gives consistency 
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graph-theoretic problems and notions of approximability used in the paper, leading to the statement 
of our main theoretical results in Section 4, which are proved in Section 5. Section 6 contains the 
mentioned examples of application of the algorithm. Several technical proofs are separately provided 
in an Appendix. 

2 Monotone Systems and Consistency 

We will illustrate the motivation for the problem studied here using systems of ordinary differential 
equations 



(the dot indicates time derivative, and x = x(t) is a vector), although the discussion applies as 
well to more general types of dynamical systems such as delay-differential systems or certain sys- 
tems of react ion- diffusion partial differential equations. In applications to biological networks, the 
component Xi(t) of the vector x = x(t) indicates the concentration of the ith species in the model 
at time t. We will restrict attention to models in which the direct effect that one given variable 
in the model has over another is either consistently inhibitory or consistently promoting. Thus, if 
protein A binds to the promoter region of gene B, we assume that it does so either to consistently 
prevent the transcription of the gene or to consistently facilitate it. (Of course, this condition does 
not prevent protein A from having an indirect influence, through other molecules, perhaps dimmers 
of A itself, that can ultimately lead to the opposite effect on gene B.) Mathematically, we require 
that for every i, j = 1 . . . n, i ^ j, the partial derivative dFi/dxj be either > at all states or < 
at all states. 

Given any partial order < defined on M. n , a system (1) is said to be monotone with respect to < 
if < yo implies x(t) < y(t) for every t > 0. Here x(t), y(t) are the solutions of (1) with initial 
conditions x , y , respectively. Of course, whether a system is monotone or not depends on the 
partial order being considered, but we one says simply that a system is monotone if the order is 
clear from the context. Monotonicity with respect to nontrivial orders rules out chaotic attractors 
and even stable periodic orbits; see [23,24,42], and is, as discussed in the introduction, a useful 
property for components when analyzing larger systems in terms of subsystems. 




(1) 
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A useful way to define partial orders in M. n , and the only one to be further considered in this 
paper, is as follows. Given a tuple s = (si, . . . s n ), where Sj G {1, —1} for every i, we say that x < s y 
if SiXi < SiDi for every i. For instance, the "cooperative order" is the orthant order < s generated 
by s = (1, ... 1). This is the order < defined by x < y if and only if Xi < yi for alH = 1, . . . , n. It 
is not difficult to verify if a system is cooperative with respect to an orthant order; the following 
lemma, known as "Kamke's condition," is not hard to prove, see [42] for details (also [5] in the more 
general context of monotone systems with input and output channels). 

Lemma 1 Consider an orthant order < s generated by s = (si, . . . , s n ). A system (1) is monotone 
with respect to < s if and only if 

dF- 

SiSj -r^- > 0, i, j = 1 . . . n, i^j. (2) 

To provide intuition, let us sketch the sufficiency part of the proof for the special case of the 
cooperative order. Suppose by contradiction that the system is not monotone, and that therefore 
there is a pair of initial conditions xq < yo whose solutions x(t), y(t) cease to satisfy x(t) < y(t) at 
some point. This implies that at a certain critical moment in time t, there is some coordinate i so 
that Xi(t~) < Ui(t~) but Xi(t + ) > yi(t + ). (This argument is not entirely accurate, but it gives the 
flavor of the proof.) Thus Xi(t) = yi{t) for some % and the derivative with respect to time of is 
larger than that of y^ at time t, meaning that that F^x) > F^y), where x = Xi(t) and y = yi(t). 
However, this cannot happen if Fi is increasing on all the variables Xj except possibly Xi, so that 
x < y, Xi = yi implies Fi(x) < Fi(y). An equivalent way to phrase this condition is by ask that 
dFi/dxj > at all states for every % ^ j, which is the Kamke condition for the special case of 
the cooperative order. The name of the order arises because in a monotone system with respect to 
that order each species promotes or "cooperates" with each other. 

A rephrasing of this characterization of monotonicity with respect to orthant orders can be given 
by looking at the signed digraph associated to (1) and defined as follows. Let V(G) = {1, . . . , n}. 
Given vertices i, j, let G E(G) and /e(i, j) = 1 if both dFj/dxi > and the strict inequality 
holds at least at one state. Similarly let (i,j) G E{G) and j) = —1 if both dFj/dxi < and 
the strict inequality holds at least at one state. Finally, let (z, j) ^ E(G) if dFj/dxi = 0. Recall 
that we are assuming that one of the three cases must hold. 
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Now we can define an orthant cone using any function fy : V(G) — > { — 1, 1}, by letting x </ v y 
if and only if fy(i)xi < fv(i)yi for all i. Given fy, we define the consistency function g : E{G) — > 
{true, false} by g(i,j) = fv(i)fv(j)fE(i,j)- Then, the following analog of Lemma 1 holds. 

Lemma 2 Consider a system (1) and an orthant cone </ v . Then (1) is monotone with respect to 
<f v if and only if g(i,j) = 1 on E(G). 

Proof. Let s; = f v (i), i = l...n. Note that SiSjdf/dxj = if £ E(G). For e E(G), 
it holds that SiSjdfi/dxj > if and only if SiSjfE(i,j) = 1, that is, if and only if g(i,j) = 1. The 
result follows from Lemma 1. □ 

For the next lemma, let the parity of a chain in G be the product of the signs (+1, —1) of its 
individual edges. We will consider in the next result closed undirected chains, that is, sequences 
Xi x . . . x ir such that x iY = x ir , and such that for every A = 1, . . . , r — 1 either (x^x, £i,a+i) £ E(G) 
or (x^x+ux^x) e E(G). 

The following lemma (see [14] as well as [43, page 101]) is analogous to the fact from vector 
calculus that path integrals of a vector field are independent of the particular path of integration 
if and only if there exists a potential function. Since the result is key to the formulation of the 
problem being considered, we provide a simple and self-contained proof in an Appendix. 

Lemma 3 Consider a dynamical system (1) with associated directed graph G. Then (1) is mono- 
tone with respect to some orthant order if and only if all closed undirected chains of G have parity 
1. 

2.1 Systems with Inputs and Outputs 

As we discussed in the introduction, a useful approach to the analysis of biological networks consists 
of decomposing a given system into an interconnection of monotone subsystems. The formulation of 
the notion of interconnection requires subsystems to be endowed with "input and output channels" 
through which information is to be exchanged. In order to address this we consider controlled 
dynamical systems ([46], which are systems with an additional parameter u G IR m , and which have 



S 



the form 

x = g(x,u). (3) 

The values of u over time are specified by means of a function t — *■ u(t) G R m , t > 0, called an 
input pr control. Thus each input defines a time-dependent dynamical system in the usual sense. 
To system (3) there is associated a feedback function h : W 1 — » R m , which is usually used to create 
the closed loop system x = g(x,h(x)). Finally, if M n ,IR m are ordered by orthant orders <f v , < g 
respectively, we say that the system is monotone if it satisfies (2) for every u, and also 

dg ■ 

qkfvU) ^— ^ °> for ever y ( 4 ) 

(see also [5].) As an example, let us consider the following biological model of testosterone dynam- 
ics [16,34]: 

A 

X\ = -jrr^. 0\X\ 

x 2 = c\X\ - b 2 x 2 (5) 
x 3 = c 2 x 2 - b 3 x 3 . 

Drawing the digraph of this system, it is easy to see that it is not monotone with respect to any 
orthant order, as follows by application of Lemma 3. On the other hand, replacing x 3 in the first 
equation by u, we obtain a system that is monotone with respect to the orders < (1,1,1) ' — (-1) f° r 
state and input respectively. Defining h(x) = x 3 , the closed loop system of this controlled system is 
none other than (5). The paper [16] shows how, using this decomposition together with the "small 
gain theorem" from monotone input /output theory ([5]) leads one to a proof that the system does 
not have oscillatory behavior, even under arbitrary delays in the feedback loop, contrary to the 
assertion made in [34]. 

We can carry out this procedure on an arbitrary system (1) with a directed graph G, as follows: 
given a set E of edges in G, enumerate the edges in E c as ji), . . . (i m ,jm)- F° r every k = 1 . . .m, 
replace all appearances of Xi k in the function Fj k by the variable Uk, to form the function g(x,u). 
Define h(x) = (x^, . . -Xi m ). It is easy to see that this controlled system (3) has closed loop (1). 

Note that the controlled system (3) generated by the set E as above has, as associated digraph, 
the sub-digraph of G generated by E. This is because for every k, one has dgj k (x, u)/dx ik = 0, i.e., 
the edge from i^ to jk has been "erased" . 
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Let the set E be called consistent if the undirected subgraph of G generated by E has no closed 
chains with parity —1. Note that this is equivalent to the existence of fv such that g = 1 on E, by 
Lemma 4 applied to the open loop system (3). If E is consistent, then the associated system (3) 
itself can also be shown to be monotone: to verify condition (4), simply define each so that (4) is 
satisfied for k, jV Since dgj k /dui~ = dFj k /dxi k ^ 0, this choice is in fact unambiguous. Conversely, 
if (3) is monotone with respect to the orthant orders </y, < g , then in particular it is monotone for 
every fixed constant u, so that E is consistent by Lemma 3. We thus have the following result. 

Lemma 4 Let E be a set of edges of the digraph G. Then E is consistent if and only if the 
corresponding controlled system (3) is monotone with respect to some orthant orders. 

3 Statement of Problem 

A natural problem is therefore the following. Given a dynamical system (1) that admits a digraph 
G, use the procedure above to decompose it as the closed loop of a monotone controlled system 
(3), while minimizing the number \\E C \\ of inputs. Equivalently, find f v such that P(E + ) =\\E + \\ is 
maximized and P(EJ) =||£L||=||£7V]| minimized. This produces the following problem formulation. 

Problem 1 (Undirected Labeling Problem(£/LP)) : 

An instance of this problem is (G, h), where G = (V, E) is an undirected graph and h: E i— > {0, 1}. A 
valid solution is a vertex labeling function f: V — * {0, 1}. Define an edge {u, v} G E to be consistent 
iffh(u,v) = (f(u) + f{v)) (mod 2). The objective is then to find a valid solution maximizing \F\ 
where F is the set of consistent edges. 

That U LP is a correct formulation for our problem is confirmed by the following easy equivalence. 

Proposition 1 Consider an instance (G, h) of ULP with an optimal solution having x consistent 
edges given by a vertex labeling function f . Let D be a set of edges of smallest cardinality that have 
to be removed such that for the remaining graph, that is the graph G' = (V,E\ D) with the same 
vertex set V but an edge set E\D, there exists a vertex labeling function /': V — ■> {0, 1} that makes 
every edge consistent. Then, x = \E\ — \D\. 
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Proof. Since / produces a solution of ULP with x consistent edges, exactly \E\ — x edges are 
inconsistent, thus \D\ < \E\ — x, that is, x < \E\ — \D\. Conversely, since there is a solution with 
l^l — \D\ consistent edges, x > \E\ — \D\. □ 

A special case of ULP, namely when h(e) = 1 for all e e E, is the MAX-CUT problem (defined 
in Section 3.1). Moreover, ULP can be posed as a special type of "constraint satisfaction problem" 
as follows. We have \E\ linear equations over GF(2), one equation per edge and each equation 
involving exactly two variables, over \V\ Boolean variables. The goal is to assign values to the 
variables to satisfy the maximum number of equations. For algorithms and lower-bound results for 
general cases of these types of problems, such as when the equations are over GF(p) for an arbitrary 
prime p > 2, when there are an arbitrary number of variables per equation or when the goal is to 
minimize the number of unsatisfied equations, see references such as [2, 9, 12, 22] and the references 
therein. 

Given orthant orders < f v and < q for M. n and M m respectively, we say that a feedback function 
h is positive if x <f v y implies h(x) < q h(y), and that it is negative if x <f v y implies h(x) > q h(y). 
It can be shown that the closed loop of a monotone system with a positive feedback function is 
actually itself monotone, so that no system can be produced in this way that was not monotone 
already. But if h is a negative feedback function, then several results become available which use the 
methods of monotone systems for systems that are not monotone, see [5, 16, 18]. For the following 
result, let (C, C) be the class of consistent subsets of E(G), ordered under inclusion. 

Proposition 2 Let E be a consistent set. Then E is maximal in (C, C) if and only if h is a negative 
feedback function for every fv such that g = 1 on E. 

Proof. Suppose that E is maximal, and let fv be such that g = 1 on E. Given any edge 
(ik,jk) £ E c , it holds that g{ik,3k) = —1- Otherwise one could extend E by adding (ik,jk), 
thus violating maximality. That is, fv{^k)fv{3k)fE{ikijk) — — 1- By monotonicity, it holds that 
Qkfv{jk)dgj k /duk > 0, and since dgj k /duk = dFj k /dx ik , it follows necessarily that 

Qkfv(jk)fE(ik,jk) = 1. 

Therefore it must hold that qk = —fv{ik) for each k, which implies that h is a negative feedback 
function. 
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Conversely, if fy is such that g = 1 on E and h is a negative feedback function, then = 
—fv(h)- By the same argument as above, qkfv(jk)fE(ik, jk) = 1 for all k by monotonicity. Therefore 
g = — 1 on E . Repeating this for all admissible fy, maximality follows. □ 

There is a second, slightly more sophisticated way of writing a system (1) as the feedback 
loop of a system (3) using an arbitrary set of edges E. Given any such E, define S(E C ) = 
{i | there is some j such that G E c }. Now enumerate S(E C ) as . . .i m }, and for each k 

label the set {j \ (ik,j) G E c } as jki,jk2, ■ ■ ■■ Then for each k, I, one can replace each appearance of 
x ik in Fj kl by Uk, to form the function g(x,u). Then one lets h(x) = (x^, . . . ,x im ) as above. The 
closed loop of this system (3) is also (1) as before but with the advantage that there are \S(E C )\ 
inputs, and of course \S(E C )\ < \E C \. 

If E is a consistent and maximal set, then one can make (3) into a monotone system as follows. 
By letting fy be such that g = 1 on E, we define the order <f v on IR n . For every ik,jki such that 
{h,jki) e E c , it must hold that fv(ik)fv(jki)fE(ik,jki) = -1- Otherwise £ U {(z fe , j k i)} would be 
consistent, thus violating maximality. By choosing q^ = —fv(ik), equation (4) is therefore satisfied. 
See the proof of Proposition 2. Conversely, if the system generated by E using this second algorithm 
is monotone with respect to orthant orders, and if h is a negative function, then it is easy to verify 
that E must be both consistent and maximal. 

Thus the problem of finding E consistent and such that P(EJ) =||5'(-E'_)|| = ||S'(£' C ')|| is smallest, 
when restricted to those sets that are maximal and consistent (this does not change the minimum 
HS^i? )!!), is equivalent to the following problem: decompose (1) into the negative feedback loop of 
an orthant monotone control system, using the second algorithm above, and using as few inputs as 
possible. This produces the following problem formulation. 

Problem 2 (Directed Labeling Problem(DLP)) : 

An instance of this problem is (G,h) where G = (V,E) is a directed graph and h: E — > {0, 1}. A 
valid solution is a vertex labeling function f:V^ {0, 1}. Define an edge (u,v) G E to be consistent 
iff h(u,v) = (f(u) + f(v)) (mod 2). The objective is then to find a valid solution minimizing 
\g(E — F)\ where g(C) = {u G V \ 3y £ V, (u, y) G C} for any C C E and F is the set of consistent 
edges. 
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3.1 Summary of Key Concepts and Results in Approximation Algo- 
rithms 

For any 7 > 1 (resp. 7 < 1), a '-/-approximate solution (or simply an 7-approximation) of a 
minimization (resp., maximization) problem is a solution with an objective value no larger than 7 
times (resp., no smaller that 7 times) the value of the optimum, and an algorithm achieving such 
a solution is said to have an approximation ratio of 7. 

In [38] Papadimitriou and Yannakakis defined the class of MAX-SNP optimization problems and 
a special approximation-preserving reduction, the so-called L-reduction, that can be used to show 
MAX-SNP-hardness of an optimization problem. The version of the L-reduction that we provide 
below is a slightly modified but equivalent version that appeared in [10]. 

Definition 5 [10, 38] Given two optimization problems U and W, we say that U L-reduces to IT' 
if there are three polynomial-time procedures T 1; T 2 , T 3 and two constants a and b > such that 
the following two conditions are satisfied: (1) For any instance I ofU, algorithm T\ produces an 
instance V = f(I) of TV generated from T\ such that the optima of I and I' , OPT {I) and OPT {I'), 
respectively, satisfy OPT {I') < a ■ OPT {I). (2) For any solution of I' with cost d, algorithm T2 
produces another solution with a cost c" no worse than d , and algorithm T 3 produces a solution of I of 
IT with costc (possibly from the solution produced byT 2 ) satisfying \c — OPT(I)\ < b-\c" — OPT(I')\. 

An optimization problem is MAX-SNP-/iard if any problem in MAX-SNP L-reduces to that problem. 
The importance of proving MAX-SNP-hardness results comes from a result proved by Arora et al. [8] 
which shows that, assuming P^NP, for every MAX-SNP-hard minimization (resp., maximization) 
problem there exists a constant e > such that no polynomial time algorithm can achieve an 
approximation ratio better than 1 + e (resp., better than 1 — e). 

A special case of the ULP problem, namely when h(e) = 1 for all e £ E, is the well-known MAX- 
CUT problem. An instance of this problem is an undirected graph G = (V,E). A valid solution is a 
set S C V. The objective is to find a valid solution that maximizes the number of edges {u, v} £ E 
such that \{u,v} n S\ = 1. The MAX-CUT problem is known to be MAX-SNP-hard. For further 
details on these topics, the reader is referred to the excellent book by Vazirani [49] . 
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3.2 Notations and Terminology 



The following notations are used in the rest of the paper. V(G) and E(G) are the vertex set and 
edge set of graph G, respectively, and G is the underlying undirected graph of a directed graph 
G obtained by ignoring the directions of the edges. For SCI/, G(S) denotes the subgraph of G 
vertex-induced by S, and E ou ^ (S) = {(u,v) G E{G) \ u G S} is the set of out-bound edges of 
vertices in S. OPTp(I) denotes the size of an optimal solution for a problem P with instance I. 
The length of a circuit c with respect to weight function w. E i— > M. is defined as ^iu(e); if no 

eSc 

weight function is specified, then w(e) = 1 for all e G E is assumed. 

4 Theoretical Results 

Our theoretical results are summarized as follows. 

Theorem 6 (a) For some constant e > 0, it is not possible to approximate in polynomial time the 
ULP and the DLP problems to within an approximation ratio ofl—e and l + e, respectively, unless 
P=NP. 

(b) For ULP , we provide a polynomial time a -approximation algorithm where a ~ 0.87856 is the 
approximation factor for the MAX- CUT problem obtained in [20] via semidefinite programming. 

(c) For DLP, if d r ^ ix denotes the maximum in-degree of any vertex in the graph, then we give a 
polynomial-time approximation algorithm with an approximation ratio of at most d 1 ™ ix ■ 0(log |V|). 

Our computational results are illustrated in Section 6 by an implementation of the algorithms 
applied to a 13-node Drosophila segmentation network, as well as to a 200 + node recently published 
network of the Epidermal Growth Factor Receptor pathway. 

Remark 1 It should be noted that the complexity of ULP becomes tractable if the network is biased 
significantly towards excitatory connections. Obviously, if all the edges of the given graph G = (V, E) 
are labeled 0, then it is possible to label the vertices such that all the edges are consistent. Moreover, 
given any graph G, it is easy to check in 0((\V\ + l-El) 3 ) time if an optimal solution contains all 
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the edges as consistent by solving a set of linear equations via Gaussian elimination. Now, suppose 
that at most L of the edges of G are labeled 1. Then, obviously at most L inconsistent edges 
exist in any optimal solution. Thus a straightforward way to solve the problem is to consider all 
possible subsets of edges in which at most L edges are dropped and checking, for each such subset, 
if there is an optimal solution that contains all the edges as consistent. The total time taken is 
0(\V\ 2L . ■ (\V\ + |-E|) 3 ), which is a polynomial in \V\ + \E\ if L is a constant. 

5 Proof of Theorem 6 

This section provides the proof of Theorem 6, broken up into a series of technical parts. 
5.1 Proof of Theorem 6(a) 

Based on the discussion in Section 3.1, it suffices to show that both these problems are MAX-SNP- 
hard. ULP is MAX-SNP-hard since its special case, the MAX-CUT problem, is MAX-SNP-hard. 
To prove MAX-SNP-hardness of DLP, we need the definitions of the following two problems. 

Problem 3 (Node Deletion Problem with Bipartite Property (NDBP)) : 

An instance of this problem is an undirected graph G = (V,E). A valid solution is a vertex set 
S C V, such that G(V — S) is a bipartite graph. The objective is to find a valid solution minimizing 
\S\. 

Problem 4 (Variance of Node Deletion Problem (VNDP)) An instance of this problem is 
(G, h) where G = (V, E) is a directed graph and h: E — > {0, 1}. A valid solutions is a vertex set S C 
V with the following property: if Gs = (Vs, E$) is the graph with Vs — V and E$ = E — E ou ^ (S), 
then Gs is free of odd length circuit with respect to weight function h. The objective is to find a 
valid solution minimizing \S\. 

First, we note that DLP is equivalent to VNDP. If one identifies the solution set S in UNDP 
with the solution set g(E — E) in DLP, then the set of consistent edges F in DLP corresponds to the 
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Es in UNDP since every edge (u,v) G F satisfying h(u,v) = (f(u) + fiy)) (mod 2) is equivalent 
to stating that Gs is free of odd length circuit with respect to weight function h. 

Thus, to prove the MAX-SNP-hardness of DLP it suffices to prove that of VNDP. NDBP is 
known to be MAX-SNP-hard [29] . We provide a L-reduction from NDBP to VNDP. For an instance 
of VNDP with graph G = (V,E), construct an instance of DLP with instance (G',h) as follows 
(note that G' is a digraph): V = V(G') = V U {A U>V ,B U , V | {u,v} G E}, E' = E{G') = 
{(it, A UjV ), (A u>v , B u>v ), (v, B u >v ) {u,v} G E}, and h(e) = 1 for all e G E' Now, the following 
holds: 

(1) If S is a solution to NDBP, it is also a solution to the generated instance of UNDP. The 
reason is as follows. Notice that every odd length (resp., even length) circuit C in G corresponds 
to an odd length (resp., even length) circuit C in G' with respect to the weight function h. Since 
G(V — S) is a bipartite graph, it is free of odd length circuits. So for each odd length cycle C of G, 
there exists it G S such that the deletion of all out-bound edges of it in G' breaks its corresponding 
odd length cycle C . 

(2) If 5" is a solution to UNDP, then we can construct a solution S of NDBP in the following 
manner: for each x G S'\ 

if x = A U:V , add it to T; 
if x = B U;V , add v to T; 
if x — u or x — v, add x to T. 

It is now easy to see that since the graph Gs' is free of odd length circuit with respect to h, G(V — S) 
has no odd length circuit either. 

Hence, we have OPT UNDP (G' , h) < OPT NDBP {G). Moreover, given a solution S' of UNDP, 
we are able to generate a solution S of NDBP such that 

\\S\ - OPT NDBP (G)\ < \\S'\ - OPT UNDP (G', h)\. 

Thus, our reduction satisfies the Definition 5 of a L- reduction with a = b = 1. 
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5.2 Proof of Theorem 6(b) 



Our algorithm for ULP uses the semidefmite programming (SDP) technique used by Goemans and 
Williamson in [20]; hence we use notations and terminologies similar to that used in the paper 
(readers not very familiar with this technique are also referred to the excellent explanation of this 
technique in the book by Vazirani [49]). For each vertex v G V, we have a real vector x v G 
with ||x„||2 = 1. Then, we can generate from ULP the following vector program (where • denotes 
the vector inner product): 

Solve the following vector program via SDP methods: 

maximize ~ Yl (1 ~ x u ' %v) + \ Yl 0- + x u " x v) 

h(u,v)=l h(u,v)=0 

subject to: for each v G V: x v ■ x v = 1 for each v G V: x v G M) v \. 
Select a uniformly random vector r in the | ^-dimensional unit sphere and set 

if r ■ x v > 

f(v) = { 

1 otherwise 

This proof of the claimed approximation performance of the above vector program is obtained by 
adapting the proof in Section 26.5 of [49] for the MAX-2SAT problem to deal with fact that, in our 
problem, opposed to a different set of values in [49]. Since there are some subtleties 

in adapting that proof for readers unfamiliar with this approach, we provide a sketch of the proof in 
the appendix. The procedure can be derandomized via methods of conditional probabilities {e.g., 
see [30]). 

5.3 Proof of Theorem 6(c) 

For an instance of (G, h) of DLP, construct instance {G' = {V, E'), h') as follows: 

V' = VU{C U)V I {u, v) G E & h{u, v) = 0}, 
E' = {e | e G Ek h{e) = 1} U {{u,C U)V ), {C u , v , v) | {u, v) G E k h{u, v) = 0}, 

and 

h'{e) = 1 for all e G E' . 
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Note that every odd (resp., even) length circuit in G with respect to weight function h corresponds 
to an odd (resp., even) length circuit in G' with respect to weight function h', and vice versa. Let F 
is a set of consistent edges in (G, h) with a vertex labeling function /. Now, observe the following: 

(1) F 1 is a set of consistent edges in (G', h') with a vertex labeling function /' with f'(x) = f(x) 
for x G V fl V and f'(C UjV ) = f(u) = f(v) for an edge (u,v) G F with h(u,v) = 0; thus, an 
edge (u, v) in F correspond to an edge (u, v) in F' if h(u, v) = 1 and correspond to a pair of edges 
(u,C u>v ), (C UjV ,v) in F' if h(u,v) = 0. 

(2) If (u,v) G E — F is an inconsistent edge in (G, h), then the edge (C U)V ,v) in G' can always be 
made consistent by choosing f'(C UjV ) = f(v). 

Thus, if F" is the set of consistent edges obtained from F following rules (1) and (2) above, then 
\g{E'-F")\ = \g(E-F)\ and thus OPT DLP (G', ti) = OPT DLP (G,h). Consider the NDBP problem 
on G'. Any solution to DLP on (C, h') with vertex labeling function /' and set of consistent edges 
F' cannot contain an odd cycle of consistent edges and thus provides a solution to NDBP on G' of 
size \g(E'-F')\. Thus, OPT^^G 7 ) < OPT DLP (G' , h') = OPT DLP (G,h). OPT NDBP (G') can be 
approximated in polynomial time to within an approximation ratio of 0(log \ V |) [29], i.e., we can 
find a solution Sndbp{G') in polynomial time such that \Sndbp(G')\ < 0(log \V \)-OPT NDBP (G') < 
0(log |V|) - OPT DLP (G, h). Now, S DLP (G, h) = S NDBP (G') U {u \ 3v e S NDBP (G'), (u, v) G E}, is 
obviously a solution to DLP on (G,h). Remember that d^ x denotes the maximum in-degree of 
any vertex in G. Thus, \S DLP (G, h)\ < d™* ■ \S NDBP (G')\ < d™ ax ■ 0(log \V\) ■ OPT DLP (G, h). 

6 Two Examples of Applications of the ULP Algorithm 

We have implemented the SDP-based algorithm for calculating approximate solutions of the undi- 
rected labeling problem using Matlab, and we illustrate this algorithm with two applications to 
biological systems. The first application concerns the relatively small-scale 13-variable digraph of 
a model of the Drosophila segment polarity network. The second application involves a digraph 
with 300+ variables associated to the human Epidermal Growth Factor Receptor (EGFR) signaling 
network. This model was published recently and built using information from 242 published papers. 
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Figure 5: A digram of the Drosophila embryo during early development. A part of the segment 
polarization process is displayed. Courtesy of N. Ingolia and PLoS [25] 

6.1 Drosophila Segment Polarity 

An important part of the development of the early Drosophila (fruit fly) embryo is the differentiation 
of cells into several stripes (or segments), each of which eventually gives rise to an identifiable part 
of the body such as the head, the wings, the abdomen, etc. Each segment then differentiates 
into a posterior and an anterior part, in which case the segment is said to be polarized. (This 
differentiation process continues up to the point when all identifiable tissues of the fruit fly have 
developed.) Differentiation at this level starts with differing concentrations of certain key proteins in 
the cells; these proteins form striped patterns by reacting with each other and by diffusion through 
the cell membranes. 

A model for the network that is responsible for segment polarity [13] is illustrated in Figure 6. 
As explained above, this model is best studied when multiple cells are present interacting with each 
other. But it is interesting at the one-cell level in its own right — and difficult enough to study 
that analytic tools seem mostly unavailable. The arrows with a blunt end are interpreted as having 
a negative sign in our notation. Furthermore, the concentrations of the membrane-bound and 
inter-cell traveling compounds PTC, PH, HH and WG(membrane) on all cells have been identified 
in the one-cell model (so that, say, HH— * PH is now in the digraph). Finally, PTC acts on the 
reaction CI— > CN itself by promoting it without being itself affected, which in our notation means 
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A 



Interactions 




Figure 6: The network associated to the Drosophila segment polarity, as proposed in [13], Courtesy 
of N. Ingolia and PLoS. The three edges that have been crossed have been chosen in order to let 
the remaining edges form an orthant monotone system. 

PTCi CN and PTC ^ CI. 



The Implementation The Matlab implementation of the algorithm on this digraph with 13 
nodes and 20 edges produced several partitions with as many as 17 consistent edges. One of these 
possible partitions simply consists of placing the three nodes ci, CI and CN in one set and all 
other nodes in the other set, whereby the only inconsistent edges are CL^ wg, CL — > ptc, and 
PTC — > CN. But note that it is desirable for the resulting open loop system to have as simple 
remaining loops as possible after eliminating all inconsistent edges. In this case, the remaining 
directed loops 

EN A ci -i CI -i CN ^eni EN 
EN^cii CI iCN ^wgiWG ^ WG(membrane) ien^EN 

can still cause difficulties. 

A second partition which generated 17 consistent edges is that in which EN, hh, CN, and the 
membrane compounds PTC, PH, HH are on one set, and the remaining compounds on the other. 
The edges cut are ptc — » PTC, CI — > CN and en — > EN, each of which eliminates one or several 
positive loops. By writing the remaining consistent digraph in the form of a cascade, it is easy to 
see that the only loop whatsoever remaining is wg <-» WG; this makes the analysis proposed in [18] 
easier. 
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In this relatively low dimensional case we can prove that in fact OPT = 17, as the results below 
will show. 

Lemma 7 Any partition of the nodes in the digraph in Figure 6 generates at most 17 consistent 
edges. 

Proof. From Lemma 3, a simple way to prove this statement is by showing that there are three 
disjoint cycles with odd weighted length in the network associated to Figure 6 (disjoint in the sense 
that no edge is part of more than one of the cycles). Such three disjoint cycles exist in this case, 
and they are CI-CN-wg, CI-ptc-PTC, CN-en-EN-hh-HH-PH-PTC. □ 

Multiple Copies 

It was mentioned above that the purpose of this network is to create striped patterns of protein 
concentrations along multiple cells. In this sense, it is most meaningful to consider a coupled 
collection of networks as it is given originally in Figures 5 and 6. Consider a row of k cells, each of 
which has independent concentration variables for each of the compounds, and let the cell-to-cell 
interactions be as in Figure 6 with cyclic boundary conditions (that is, the k-th cell is coupled with 
the first in the natural way). We show that the results can be extended in a very similar manner 
as before. 

Given a partition fy of the 1-cell network considered above, let fy be the partition of the /c-cell 
network defined by Jy (en,) := /y(en) for every i, etc. Thus fy consists of k copies of the partition 
fy in a natural way. 

Lemma 8 Let fy be a partition of the nodes of the 1-cell network with n consistent edges. Then 
with respect to the partition fy, there are exactly kn consistent edges for the k-cell coupled model. 

Proof. Consider the network consisting of k isolated copies of the network, that is, k groups of 
nodes each of which is connected exactly as in the 1-cell case. Under the partition fy, this network 
has exactly kn consistent edges. To arrive to the coupled network, it is sufficient to replace all 
edges of the form (HHj, PH,;) by (HH i+ i, PFL) and (WGj, em) by (WGj+i, enj), z = 1 ... A: (where we 
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CN 
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Figure 7: A sub-digraph of the network in Figure 6, using the notation defined in the previous sec- 
tions. Note that this sub-digraph doesn't include any of the two edges (WGmem,en) and (HH,PH), 
which connect the networks of different cells in Figure 6; this will be important in the proof of 
Lemma 9. 



identify k + 1 with 1). Since by definition / v (HH i+ i) = /y(HILj and / v (WG i+ i) = ^(WQ), the 



consistency of these edges does not change, and the number of consistent edges therefore remains 



In particular, OPT> 17k for the coupled system. The following result will establish an upper 
bound for OPT. 

Lemma 9 Any partition of the nodes in the digraph in the k-cell coupled network generates at most 
17k consistent edges. 

Proof. Consider the signed graph in Figure 7, which is a sub-digraph of the network associated 
to Figure 6. Since the inter-cell edges (WGmem,en) and (HH,PH) are not in this graph, it follows 
that there are k identical copies of it in the k-ce\\ model. If it is shown that at least three edges 
need to be cut in each of these k sub-digraphs, the result follows immediately. 

Consider the negative cycle ci-CI-wg-CN-en-EN, which must contain at least one inconsistent 
edge for any given partition. The remaining edges of the subgraph form a tetrahedron with four 
negative parity triangles, which cannot all be cut by eliminating any single edge. If follows that 



constant. 



□ 
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no two edges can eliminate all negative parity cycles in this signed graph, and that therefore 
20k — 3k = 17k is an upper bound for the number of consistent edges in the fc-cell network. □ 

Corollary 10 For the k-cell linearly coupled network described in Figure 6, it holds OPT=17k. 
Proof. Follows from the previous two results. □ 

6.2 EGFR Signaling 

The protein called epidermal growth factor is frequently stored in epithelial tissues such as skin, 
and it is released when rapid cell division is needed (for instance, it is mechanically triggered 
after an injury). Its function is to bind to a receptor on the membrane of the cells, aptly called 
the epidermal growth factor receptor. The EGFR, on the inner side of the membrane, has the 
appearance of a scaffold with dozens of docks to bind with numerous agents, and it starts a reaction 
of vast proportions at the cell level that ultimately induces cell division. 

In their May 2005 paper [37], Oda et al. integrate the information that has become available 
about this process from multiple sources, and they define a network with 330 known molecules 
under 211 chemical reactions. The network itself is available from the supplementary material in 
SBML format (Systems Biology Markup Language, www.sbml.org), and will most likely be subject 
to continuous updates. 

The Implementation Each reaction in the network classifies the molecules as reactants, prod- 
ucts, and/or modifiers (enzymes). We imported this information into Matlab using the Systems 
Biology Toolbox, and constructed a digraph G in our notation by letting sign(z, j) = 1 if there exists 
a reaction in which j is a product and i is either a reactant or a modifier. We let sign(z, j) = —1 if 
there exists a reaction in which j is a reactant, and i is also either a reactant or a modifier. Similarly 
sign(i, j) = if the nodes i,j are not simultaneously involved in any given reaction, and sign(i, j) 
is undefined (NaN) if the first two conditions above are both satisfied. 

An undefined edge can be thought of as an edge that is both positive and negative, and it can 
be dealt with, given an arbitrary partition, by deleting exactly one of the two signed edges so that 
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the remaining edge is consistent. Thus, in practice, one can consider undefined edges as edges with 
sign 0, and simply add the number of undefined edges to the number of inconsistent edges in the 
end of each procedure, in order to form the total number of inputs. This is the approach followed 
here; there are exactly 7 such entries in the digraph G. 

The Results After running the algorithm 100 times for this problem, and choosing that partition 
which produced the highest number of consistent edges, the induced consistent set contained 633 out 
of 852 edges (ignoring the edges on the diagonal and the 7 undefined edges). See the supplementary 
material for the relevant Matlab functions that carry out this algorithm. A procedure analogous to 
that carried out for system ( 5) allows to decompose the system as the feedback loop of a controlled 
monotone system using 852 — 633 = 219 inputs. Since the induced consistent set is maximal by 
definition, Proposition 2 guarantees that the function h is a negative feedback. 

Contrary to the previous application, many of the reactions involve several reactants and prod- 
ucts in a single reaction. This induces a denser amount of negative and positive edges: even though 
there are 211 reactions, there are 852 (directed) edges in the 330 x 330 graph G. It is very likely 
that this substantially decreases OPT for this system. 

The approximation ratio of the SDP algorithm is guaranteed to be at least 0.87 for some r, 
which gives the estimate OPT<^ 633/0.87 ~ 728 (valid to the extent that r has sampled the right 
areas of the 330-dimensional sphere, but reasonably accurate in practice). 

One procedure that can be carried out to lower the number of inputs is a hybrid algorithm 
involving out-hubs, that is, nodes with an abnormally high out-degree. Recall from the description 
of the DLP algorithm that all the out-edges of a node Xj can be potentially cut at the expense 
of only one input u, by replacing all the appearances of Xi in fj{x), j ^ i, by u. We considered 
the k nodes with the highest out-degrees, and eliminated all the out-edges associated to these hubs 
from the reaction digraph to form the graph G\. Then we run the ULP algorithm on G\ to find a 
partition fy of the nodes and a set of m edges that can be cut to eliminate all remaining negative 
closed chains. Finally, we put back on the digraph those edges that were taken in the first step, and 
which are consistent with respect to the partition fy. The result is a decomposition of the system 
as the negative feedback loop of a controlled monotone system, using at most k + m edges. 
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An implementation of this algorithm with k = 60 yielded a total maximum number of inputs 
k + m — 137. This is a significant improvement over the 226 inputs in the original algorithm. 
Clearly, it would be worthwhile to investigate further the problem of designing efficient algorithms 
for the DLP problem to generate improved hybrid algorithmic approaches. The approximation 
ratios in Theorem 6(c) are not very satisfactory since d^ x and log \ V\ could be large factors; hence 
future research work may be carried out in designing better approximation algorithms. 

We conclude with another, more tentative way to drastically reduce the number of inputs nec- 
essary to write this system as the negative closed loop of a controlled monotone system. The idea 
is to make suitable changes of variables in the original system using the mass conservation laws. 
Such changes of variables are discussed in many places, for example in [50] and [5]. In terms of the 
associated digraph, the result of the change of variables is often the elimination of one of the closed 
chains. The simplest target for a suitable change of variables is a set of three nodes that form part 
of the same chemical reaction, for instance two reactants and one product, or one reactant, one 
product and one modifier. It is easy to see that such nodes are connected in the associated digraph 
by an odd length triangle of three edges. 

In order to estimate the number of inputs that can potentially be eliminated by suitable changes 
of variables, we counted pairwise disjoint, odd length triangles in the digraph of the EGFR network. 
Using a greedy algorithm to find and tag disjoint negative feedback triangles, we found a maximal 
number of them in the subgraph associated to each of the 211 chemical reactions. Special care was 
taken so that any two triangles from different reactions were themselves disjoint. After carrying 
out this procedure we found 196 such triangles in the EGFR network. This is a surprisingly high 
number, considering that each of these triangles must have been opened in the ULP algorithm 
implementation above and that therefore each triangle must contain one of the 226 edges cut. To 
the extent to which most of these triangles can be eliminated by suitable changes of variables, 
this can yield a much lower number of edges to cut, and it could provide a way to thus stress the 
underlying structure of the system. 
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7 Supplementary Material: MATLAB Implementation Files 



A set of MATLAB programs have been written to implement the algorithms described in this paper. 
They can be accessed from the URL http : //www. math, rutgers . edu/~sontag/desz_README.html. 
The appendix contains more details about it. 
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APPENDIX 



A More Details on SDP Algorithm 

In this appendix, we provide details regarding the proof of the SDP algorithm for Theorem 6(b) 
described in Section 5.2. The proof method is similar to that used in better-known problems. For 
simplicity, we do not describe the derandomization methods and provide a proof for the expected 
approximation ratio only. Define the following notations for convenience: 

• The vertex set V of the graph for ULP is simply {1, 2, . . . , | V|}; 

• /opt is an optimal vertex labeling for ULP with Fqpt being the set of consistent edges; 

• SDPqpt is the maximum value of the objective value of the vector program 

maximize^ Yl (1 — x u • x v ) + \ Yl (l + x u -x v ) 

h(u,v)=l h(u,v)=0 

subject to: for each v G V: x v ■ x v = 1 
for each v G V: x v e M |y| 

It is easy to see that SDPqpt > I-^optI as follows. For every v G V if /opt( u ) = then set 

x v 

whereas if /opt(^) = 1 then set 

x v = (-1,0,0 0) 

this provides a solution for the vector program with an objective value of precisely |-Fopt|- Thus, 
it suffices if we prove our claim on the approximation ratio relative to SDPqpt 

Next, note that the vector program can indeed be solved by a SDP approach. Let Y G Ml y l x l y l 
be an unknown real matrix with yij denoting the (i, j) th element of Y . It is not difficult to see (via 
Cholesky decomposition for real symmetric matrices) that the above vector program is equivalent 
to the following semidefinite programming problem: 
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maximize | ^ (1- y UjV ) + \ i 1 + Vu,v) 

h(u,v)=l h(u,v)=0 

subject to: for each v G V: y v<v = 1 

Y is a positive semidefinite matrix 

Such a problem can be solved in polynomial time within an additive error of any constant e > 
via ellipsoid, interior-point or convex-programming methods [1,21,35,36,48]. 

Let 9 U>V denote the angle between the two vectors x u , x v G in an optimal solution of the 
vector program. Then, using standard trigonometric results, 



Let W be the expected value of the number of consistent edges of ULP after we have performed 
the randomized rounding step, namely the step: 

select a uniformly random vector r in the |y|-dimensional unit sphere; 




h(u,v)=l 



h(u,v)=0 




Then, via linearity of expectation, it follows that 



E[W]= *M/(«) ^ /(«)] + E *M/(«) = /(«)]• 



h(u,v)=l h(u,v)=0 

Because the vector r was chosen randomly, it is true that 



Pr[/(«) + /(«)] 







and Pr[/(«) = f(v)} = 1 
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u,v 



TV 



Thus, 



E[W] 





h(u,v)=l 



h(u,v)=0 



A • SDP 



OPT 



where 




can be shown to satisfy A > 0.87856 using elementary calculus. 
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A.l Proof of Lemma 3 



Proof. Suppose that the system is monotone with respect to <f v , that is, 

fv(i)fv(j)fE(i,j) = 1 for all i ^ j. 

(by Lemma 2). Let V(G) = A U B, where i G A if fv(i) = 1, and i G i? otherwise. Note that by 
hypothesis = 1 if X{,Xj G A or if Xi,Xj G -B. Also, j) = —1 if £j G A, G B or vice 

versa. Noting that every closed chain in G must cross an even number of times between A and B, 
it follows that every closed chain has parity 1. 

Conversely, let all closed chains in G have parity 1. We define a function fy as follows: consider 
the partition of V(G) induced by letting % ~ j if there exists an undirected open chain joining i and 
j. Pick a representative of every equivalence class, and define fv(ik) = 1, k = 1, . . . , K. Next, 
given an arbitrary vertex i and the representative of its connected component, define fv(i) as the 
parity (+1 of —1) of any undirected open chain joining ik with i. To see that this function is well 
defined, note that any two chains joining i and j can be put together into a closed chain from ik to 
itself, which has parity 1 by hypothesis. Thus the parity of both open chains must be the same. 

Let now i,j be arbitrary different vertices. If dFj/dxi = 0, then (2) is satisfied for otherwise 
there is an edge joining i with j. By construction of the "potential" function fy, it holds that if 
fv(i) = fv(j) then fE^hj) = 1, i-e., dFj/dxi > 0, and so (2) holds as well. If fv(i) ^ fv(j), then 
fE{i,j) — — 1; i-e. dFj/dxi < 0. In that case (2) also holds, and the proof is complete. □ 



B Supplementary Material: MATLAB Implementation Files 
(more details) 

A set of MATLAB programs have been written to implement the algorithms described in this paper. 
They can be accessed from the following URL: 

http : //www . math . rutgers . edu/~sontag/desz_README . html 
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The files in this directory are MATLAB functions and scripts in .m format. They can be opened 
using any text editor, and each contains descriptions regarding its purpose and use. Two useful 
packages to be used when running these functions are: 

1. The Systems Biology Toolbox for MATLAB, which allows for networks in SBML format to 
be imported into the MATLAB environment. This toolbox also allows for processing of the 
MATLAB structures as well as the creation of SBML format files from MATLAB structures. 
It can be downloaded at http://sbml.org/. 

2. The SeDuMi Optimization Toolbox, one of the most popular implementations of the SDP 
algorithm for MATLAB. It is freely available for download at http : / /sedumi . mcmaster . ca/. 

The most important functions in this directory are listed below: 

(i) React ionDigraph.m: this function receives a model in SBML format and produces the associ- 

ated reaction digraph associated to the reaction. 

(ii) RepeatPartition.m: this function produces a partition p which optimizes the number of 

consistent edges of a given signed digraph G. It implements the SDP-based ULP algorithm. 

(iii) DLPtrim.m: this function implements the hybrid ULP-DLP algorithm mentioned in the end 
of the discussion of the SGFR network. 

(iv) PlunderNTriangle .m: this function uses a greedy algorithm to eliminate odd parity, pairwise 
disjoint triangles from a given subgraph of a signed digraph G (to be used in connection to 
the discussion regarding changes of variables to eliminate inputs in the decomposition). 
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This figure "odell_network_from_ingolia_cut.jpg" is available in "jpg" format from: 



http://arXiv.org/ps/q-bio/0509040vl 



